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LE YANG 

Abstract. In this paper, we define the geometric median of a proba- 
bility measure on a Riemannian manifold, give its characterization and 
a natural condition to ensure its uniqueness. In order to calculate the 
median in practical cases, we also propose a subgradient algorithm and 
prove its convergence as well as estimating the error of approximation 
and the rate of convergence. The convergence property of this sub- 
gradient algorithm, which is a generalization of the classical Weiszfeld 
algorithm in Euclidean spaces to the context of Riemannian manifolds, 
also answers a recent question in P. T. Fletcher et al. [13] 



1. INTRODUCTION 

The classical Fermat point of a triangle in the plane is the point mini- 
mizing the sum of distances to the three vertices. This is a prototype of 
the more general Fermat- Weber problem concerning the same question but 
for more than three points and in higher dimensions. The solution of this 
problem is called the geometric median of these points and provides a notion 
of centrality for them. For this reason, the geometric median is a natural 
estimator in statistics which possesses another important property called ro- 
bustness, that is, not sensitive to outliers. As a consequence, the geometric 
median is a widely used robust estimator in both theoretical and practical 
theory of robust statistics. 

Naturally, one can also ask the question to find a point that minimizes 
the sum of distances to a set of given points in a much more general space 
as long as it carries a distance. This has been done in Sahib [22J who 
proved the existence of the geometric median of a probability measure on a 
complete, separable and finitely compact metric space. Recently, there is a 
growing interest of the method that characterize the statistical data lying 
on a Riemannian manifold and its applications , see for example Barbaresco 
0]) [H]) [Z], Fletcher et al. [11], [12] and Pennec [21] in which the centrality 
of empirical data is modeled by the Riemannian barycenter which was first 
introduced by Karcher [H] and then has been studied by many other au- 
thors, for example see Arnaudon p], [2], [3], Emery [9| and Kendall [15]. As 
is known to all that the barycenter is not a robust estimator and sensitive 
to outliers, in order to overcome this drawback, Fletcher et al. [13] defined 



2000 Mathematics Subject Classification. 58C05, 60D05, 90C25, 90B85. 

Key words and phrases, geometric median, subgradient methods, curvature, Riemann- 
ian manifolds, nondifferentiable optimization, Fermat- Weber point, Weiszfeld algorithm. 

The author is very grateful to his supervisors: Marc Arnaudon and Frederic Barbaresco 
for many helpful discussions and their penetrating comments. The author will also thank 
Thales Air Systems for their financial supports. 

1 



2 



LE YANG 



the weighted geometric median of a set of discrete sample points lying on a 
Riemannian manifold and proved its existence and uniqueness. 

In many cases, especially in practice, one often needs to calculate or 
at least estimate the value of the geometric median. In the case of Eu- 
clidean spaces, Weiszfeld algorithm proposed firstly by Weiszfeld [23], is a 
well known algorithm to do this calculation and has been studied, improved 
on by many other authors, for example see Khun [T6] and Ostresh [20] . In 
the contexts of Riemannnian manifolds, Fletcher et al. [13] proposed a Rie- 
mannian generalization of Weiszfeld algorithm to estimate their geometric 
median, they proved a convergence result under the condition that the man- 
ifold is positively curved, it should be noted that the range of their stepsize a 
should not be [0, 2] but [1, 2], and conjectured the convergence in negatively 
curved case. 

The aim of this paper is to define the geometric median of a probability 
measure on a complete Riemannian manifold and investigate its uniqueness 
as well as its approximation. As in Karcher [T3] and Le [17] . we suppose 
that the support of the probability measure is contained in a convex ball 
and we give a characterization of the geometric median which is proved in 
the case of Euclidean space by Khun [TU] for a discrete set of sample points. 
Then we prove the uniqueness of the median under a natural condition 
imposed on the probability measure and show that this condition yields a 
strong convexity property which is useful in error estimates. By regarding 
the Weiszfeld algorithm as a subgradient procedure, we introduce a sub- 
gradient algorithm to estimate the median and prove that this algorithm 
always converges without condition of the sign of curvatures by generalizing 
the fundamental inequality in Ferreira and Oliveivra [lOj in which it was 
proved in positively curved manifolds. Finally, the results of approximating 
errors and rate of convergence are also obtained. 

Throughout this paper, M is a complete Riemannian manifold with Rie- 
mannian metric ( • , • ) and Riemannian distance d. The gradient operator 
and hessian operator on M are denoted by grad and Hess, respectively. 
For every point p in M, let d p be the distance function to p defined by 
d p (x) = d(x,p). We fix a convex ball B(a,p) in M centered at a with a 
finite radius p, here the convexity of B{a, p) means that for every two points 
x and y in it, there is a unique shortest geodesic from x to y in M that lies 
in B(a, p). The lower and upper bounds of sectional curvatures K in B(a, p) 
are denoted by 5 and A respectively. Since p is finite, 5 and A are also finite. 
// A > 0, we assume further that p < 7r/(4\/A). It is easy to check that the 
following classical comparison theorems in Riemannian geometry: Alexan- 
drov's theorem, Toponogov's theorem and Hessian comparison theorem can 
be all applied in B(a,p), thus it is necessary to introduce some notations 
for model spaces that provide us many geometric informations. 

Notation 1. Let k be a real number, the model space M 2 is defined as fol- 
lows: 

1) if k > then M 2 is obtained from the sphere 8 2 by multiplying the 
distance function by 1/ ^fn; 
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2) if k = then Mg is Euclidean space E n ; 

3) if k < then M% is obtained from the hyperbolic space H 2 by multiply- 
ing the distance function by 1/ yj—n. 

The distance between two points A and B in M 2 will be denoted by 
d(A, B). 

Notation 2. Let k be a real number, then we write for t G R, 

sm(,/K t) / ifn>0; 
S K (t) = < t ifK = 0; 

sinh(-\/— k t) I ^J—k ifn<0; 

We begin with two useful estimations for our purposes, which are almost 
direct corollaries of the Hessian comparison theorem. Observe that the 
second estimation is the core of Le [17] where it is formulated in terms of 
Jacobi fields and is proved using an argument of index lemma. 

Lemma 1. Let p G B(a,p) and 7 : [0, b] — > B(a,p) be a geodesic, then 
i) 

Hessd p (7(*),7(*)) > D(p,A)\r oi (t)\ 2 

for every t 6 [0, b] such that 7(f) ^ p, where D(p, A) = S' A (2p)/ S A (2p) > 
and 7 nor (t) is the normal component ofj(t) with respect to the geodesic from 
p to j(t). 



a) 



Res S -d 2 p m),j(t))<C(p,5M 2 



for every t £ [0,b], where the constant C(p,5) > 1 is defined by 

'\ if5>0; 
2p v /= 5 coth(2 / o v /z ^) if 8 < 0; 



cm 



Proof. Since in B(a,p) we have 5 < K < A, hence by the classical Hessian 
comparison theorem we get, for ^(t) 7^ p, 

5 A (d( 7 (t),p)) l7(t) I ^ H - s ^^)'^))^,(,( 7 (t) !P ))l^) I 

Since S' A (6)/ S<\(9) is nonincreasing for 8 > if A < and for 9 £ (0,7r) if 
A > 0, so the left inequality together with d( 7 (i),p) < 2p proves the first 
assertion. To show the second one, let 7(t) tan be the tangential component 
of j(t) with respect to the geodesic from p to ^(t) then 

Hess \4W),l{t)) = <i(7(i),p)Hessd p (7((),7(*)) + I 7(f)"™ ? 

- max r (7( "- p) 5 i w7W.P)) ' 1 } 171 
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Observe that 9S' S {9)/ S s (9) < 1 for 9 > if 6 = and for 9 G [0, n) if 5 > 0, 
thus for the case when 5 > we have 

if 5 < 0, then 9S' S {9) / S§{9) > 1 and is nondecreasing for # > 0, thus we get 

fA , s' s (d(i(t),p)) < 9 gKgp) 9 /— T ,w 9 ^ 

d(7(t) ' p) 5,(d( 7 (t),p)) - 2P S^P) = 2 P^^H2pV^5) 

hence the estimation holds for every 5 G R. Finally, the case when j(t) = p 
is trivial and the proof is complete. □ 



2. DEFINITION OF RIEMANNIAN MEDIAN 

As in Karcher p3] , we now consider a probability measure /ionM whose 
support is contained in the open ball B(a,p), and define a function 

/: B(a,p) — >R+, x\ — > d(x,p)p(dp) 

JM 

This function is well defined since for every x G B(a,p), 
f(x) = / d(x,p)fi(dp) < 2p 

JB(a,p) 

and here are some simple properties of /: 

Lemma 2. / has the following properties: 

i) f is 1-Lipschitz thus continuous; 

ii) f is convex. 

Proof, i) For every x,y G B{a,p) we have 

\f(x)-f(y)\< \d(x,p) - d(y,p)\p(dp) < d(x , y) p(dp) = d(x , y) 

JM JM 

ii) Let p G B(a,p) and 7 : [0, 1] — > B(a,p) be a geodesic then it suffices to 
show that the function t — > d{^f{t),p) is convex. If p G 7[0, 1], the convexity 
is trivial and if not, it follows form the first estimation in Lemma [TJ 

□ 

Since / is continuous on the compact set B(a,p), it attains its minimum 
here. Now it is time to give these minimum points of / a proper name. 
As the definition of barycenter in Emery and Mokobodzki 9j, we give the 
following definition. 

Definition 1. The set of all the minimum points of f is called the Riemann- 
ian median (or median) of p and is denoted by 971^. The minimal value of 
f will be denoted by /*■ 

It is easily seen that 9Jl^ is compact. Since / is convex then along every 
geodesic, its right and left derivatives exist and now we calculate them for 
later use. 
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Proposition 1. Let 7 : [0, b] — ► B(a,p) be a geodesic, then 

j t f{l{t))\ t=tQ+ = <7(to), fl"(7(*o))> + M{7(to)}|7l, ^ G [0,6) 

|/(7W)| t=t0 _ = <7(<o), ^(7(<o))> -M7(to)}|7l, *o G (0,6] 
where for x G -B(a, p), 

H(x)= [ gradd p (x)p(dp) = [ ~ ™ P \ P p(dp) £ T X M 
Jm\{x} Jm\{x} d{x,p) 

is well defined and satisfying \H(x)\ < 1. 

Particularly, if fi{x} = 0, then grad/(x) = H(x). Moreover, H is continu- 
ous outside the support of \x. 

Proof. We prove only the first identity since the proof of the second one is 
similar. For this, let to G [0, 6) and e G (0, 6 — to], then 

/( 7 (t + e)) - /( 7 (t )) _ / d( 7 (t + e),p) - d (-y(to),p) 



£ Jm e 

d( 7 (to + g ),p)-d( 7 (t ),p ) /iW + ^ )}| . { 

M\{i(t )} £ 
By letting e — > 0+ and using the bounded convergence to the above first 
integral we obtain that 

|/(7(0)U + -/ |^7(*).1-)UM*)+/K7(*.»W 



( 7 Oo), gradd p ( 7 (t )))M(#) + m{t(*o)}|t| 

AA{7(*o)} 

= ( 7 (t ), #( 7 (i ))) + M{ 7 (*o)}| 7 | 

□ 

Now we give a characterization of 9H M which is proved in Khun |16| for 
the case when /i is a finite set of points in an Euclidean space. 

Theorem 1. = {x G fl(a,p) : |fT(x)| < p{x}} 

Proof. ( C ) Let x G if H(x) = there is nothing to prove, so we assume 
that H(x) ^ 0. Consider a geodesic in B(a,p): 

7 (t) = exp^-t |f|4), tG[0, 6] 
according to the definition of 9Jt^, t = is a minimum point of / o 7 , thus 

l/w«))U>o 

Moreover, by Proposition [T] we have 

!/(7(*))| t=0+ = ( - |f|jj> ) + MM = + /u{x} 

thus we have < p{x}. 



6 



LE YANG 



(d) Suppose that \H(x)\ < p{x}, then for every y G B(a,p), we con- 
sider the geodesic 7 : [0,1] — > B(a,p) with 7(0) = x and 7(1) = y, then 
Proposition Q] and Cauchy- Schwartz inequality give that 

^/(7(*))| t=0+ = (7(0),^)>+M{x}|7l 
>|7|(-|fT(x)|+M{a:})>0 
By Lemma [2j / o 7 is convex then we have 

f(y)-f(x)>j t f( 1 (t))\ t=0+ >o 

hence we get x G 9Jt M . 

□ 

In order to describe the location of the median of p, we need the following 
geometric lemma which is also useful in the next section. 

Lemma 3. Let AABC be a geodesic triangle in B(a, p) such that ZA > tt/2, 
then ZB < tt/2 and ZC < tt/2. 

Proof. It suffices to prove that ZB < ir/2 . We show this only for A > 
since the cases when A < are similar. Let d(B,C) = a, d(A,C) = b and 
d(A, B) = c. We consider a comparison triangle AABC of AABC in Ma, 
since K < A in B(a,p), hence by Alexandrov's theorem we get ZA < ZA 
and ZB < ZB. The law of cosines in together with cos ZA < yields 
that cos(\/Aa) < cos(\/A6) cos(\/Ac). Using the law of cosines again we 
get 

. - cos(vA^) — cos(\/Aa) cos(\/Ac) 
cos ZB = — 



> 



sin(vAa) sin(vAc) 

cos(VA6) - cos(\/Ab) cos 2 ( v / Ac) 

sin(-\/Ao) sin(\/Ac) 

cos(\/A&) sin(\/Ac) 

— > 



sin(yAa 



thus ZB <ZB < tt/2. 



□ 



Proposition 2. 971^ is contained in the smallest closed convex subset of 
B(a,p) containing the support of p. 

Proof. Let V be this set and by Theorem [1] it suffices to show that if x G 
B(a, p)\V then H(x) ^ 0. In fact, let y be a point in V such that d(x, y) = 
mi{d(x,p) : p G V}, then by the convexity of V we get Zxyp > tt/2 for 
every p G V and hence Lemma [3] yields that Zpxy < tt/2. This gives that 

Jv d{x,p) 

= —d(x, y) I cos Zpxy < 
Jv 

The proof is completed by observing that exp" 1 y ^ 0. 

□ 
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3. UNIQUENESS OF RIEMANNIAN MEDIAN 

In the Euclidean case, if the sample points are not colinear, then the geo- 
metric median is unique. Hence we get a natural condition of p to ensure 
the uniqueness of the median in Riemannian case: 

* The support of p is not totally contained in any geodesic. This means 
that for every geodesic 7: [0,6] — ► B(a,p), we have //(7[0, &]) < 1. 

Theorem 2. If condition * holds, then SDT^ has a single element. 

Proof. We will prove this by showing that / is strictly convex, that is, for 
every geodesic 7: [0, b] — > B(a,p), the function / o 7 is strictly convex. In 
fact, without loss of generality, we may assume that 7(0) and 7(6) are both in 
dB(a, p). By the first estimation in Lemma[TJ for every p € B(a, p)\j[0,b] 
the function 1 1— > d(-y(t),p) is strictly convex and for p £ 7[0, b] it is trivially 
convex. Since the condition * yields that p(B(a,p) \ 7[0, b]) > 0, thus 
by integration we obtain the strict convexity of / and this completes the 
proof. □ 

In the above proof, we have seen that / is strictly convex if condition 
* holds. However, things are better than this. In fact, we can show that 
the condition * implies the strong convexity of /. In fact, the compactness 
of B{a, p) and the condition * can give a stronger result and the following 
lemma clarifies this. 

Lemma 4. // condition * holds, then there exist two constants S (0, p) 
and rjfj, £ (0, 1] such that for every geodesic 7: [0, b] — > B(a, p) we have 

where for e > 0, £(7, e) = {x G B(a, p) : d(x, j[0, b]) < e}. 

Proof. If this is not true, then for every e £ (0, p) and rj € (0, 1], there exists 
a geodesic 7: [0,6] — > B(a,p) such that p{B(^,e)) > 1 — 77. Then we obtain 
a sequence of geodesies ( r j n ) n '- [0,6] — ► B(a,p) verifying fjL(B(j n ,l/n)) > 
1 — 1/n for sufficiently large n. Since B{a,p) is compact, there exists a 
subsequence {^n k )k and a geodesic 7: [0,6] — » B(a,p) such that -y nk — ► 7 
uniformly on [0,6]. Then for every j > 1, when k is sufficiently large we have 
B('y nk ,l/nk) C B(j, 1/j), hence for these k we have fj,(B(^y, 1/j)) > 1 — 1/uk, 
by letting k — > 00 we get p{B{^, = 1 and then by letting j — * 00 we 
get ^(7(0,6]) = 1 which contradicts the condition *. □ 

Lemma 5. Let {/\AiBiCi)i = \^ be two geodesic triangles in model space M% 
such that ZA 1 < ZA 2 , ZB 1 < tt/2, d(A x ,Cx) = d(A 2 ,C 2 ) and d(A 1 ,B 1 ) = 
d(A 2 ,B 2 ). ThenZC 2 <Zd. 

Proof. We prove the lemma for the case when k > 0, the proof for the cases 
when k < is similar. Let d(Bi,C\) = a%, d(B 2 ,C 2 ) = a 2 , d(A±,Ci) = 
d(A 2 ,C 2 ) = 6 and d(A 1: B 1 ) = d(A 2 ,B 2 ) = c. Since ZA\ < ZA 2 , hence 
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a>i < 0,2- By the law of cosines in M 2 , we get 

d ^ d cos(- v /kc) — cos(- v /Ka2) 003(^^6) 
dci2 2 da>2 sin(y7c&) sin (yfKaz) 

y/n{cos{y/nb) — cos{y/Ha2) cos(- v /kc)) 

sin(y / K6) sm 2 (y/Ra2) 
V / k(cos(v / k6) — cos(- v /Kai) cos(i/kc)) 
sin ( y/~Kb) sin 2 ( y^KG^ ) 

y / Ksin(y / Kai) sin(y^ic) 

= — ■ / /-u\ ■ n 1- \ CUH 
sm(y ko) sm (VKO2J 

so that COSZC2 is nondecreasing with respect to <i2 when a2 > a±, thus we 
get cos ZC2 > cosZCi, i.e. ZC 2 < ZC X . □ 

Lemma 6. Let AABC be a geodesic triangle in B(a, p) such that ZA = tt/2 
then we have 



cos ZSi > 



sin ZC > L(p,5)d(A,B) 



where the constant 

' 2 y/6/( it \Jl- cos 4 (2 py/5)) if 5 > 0; 

L(p,5) = \l/(2V2p) if 5 = 0; 

5 /J 'cosh 4 (2 - 1 «/ <5 < 0; 

Proof. Let (AAjBjCj)^!^ be two geodesic triangles in M| such that A^i-BiCi 
is a comparison triangle of AABC and AA2B2C2 verifies that d(A2,C2) = 
d(A,C), d(A 2 ,B 2 ) = d(A, B) and ZA 2 = ZA. Assume that d(B,C) = a, 
d(A,C) = b, d(A, B) = c and d(B2,C2) = 0,2- By Lemma [3] we get that 
/LB < it/2. Since K > 5 in B(a,p), then by Toponogov's theorem we get 
ZA\ < ZA = ZA2 and ZB\ < ZB < tt/2, hence by Lemma [5] we ob- 
tain that ZC2 < ZC\. Observe that we have also ZC\ < ZC < tt/2, thus 
sinZC > SU1ZC2. Now it suffices to estimate sinZC2 according to the sign 
of the lower sectional curvature bound 5: 

If 5 > 0, using ZA2 = tt/2 and the laws of sines and cosines in M? we get 

SU1ZC2 = ^-=- - and cos(\/<5a2) = cos(V5b) cos(\/ r 5c) 
sin(v 80,2) 

hence we obtain that 

sui{yf5c) 2\/Sc 
smZC2 = — ====== > 



I- cos 2 (Vd~b) cos 2 (V5c) TT\Jl-cos i (2pV5) 

since b,c<2p and sin 6 > 26/tt for 9 G [0, tt/2]. 
Using the same method as above, we obtain that 

If 5 = 0, sin ZCo = — r ° > 



Vb 2 + c 2 ~ 2y/2p 

rt x / n ■ /n sinh(^/^c) V^5c 
If < 0, sm Z60 = — > 



cosh 2 (v/=56) cosh 2 (-/=£c) - 1 A /cosh 4 (2p v / ^) - 1 
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since sinh# > 9 for 9 > 0. □ 



We know that / is convex, thus along every geodesic it has a second 
derivative in the sense of distribution, the following proposition gives its 
specific form as well as the Taylor's formula. 

Proposition 3. Lei 7: [0,6] — > B(a,p) be a geodesic then for t £ [0,6] 
/(7(i))=/(7(0)) + ^-/(7(^))| s=0+ *+ / (t-s)v(ds) 

where v is the second derivative of fey on (0,6) in the sense of distribution, 
i.e. a bounded positive measure, which is given by 

v=( / Hessdp(7,7)/i(dp) ) ■ A| (0i6 ) +2|7| ■ (/io 7 )| (0ib) 

\JM\i[Q,b] J 

with A | (o,6) and (/u°7)|(o,6) denoting the restrictions of Lebesgue measure and 
the measure \x o 7 on (0,6) respectively. 

Proof. Firstly, observe that since 7 is a homeomorphism of (0, 6) onto its 
image, fi o 7 is a well defined measure on (0, 6). By Proposition [TJ we get 



M\y[0,b] 



Af\ 7 [0,6] 



d d 2 \ 

d(j(0),p) + —d(j(s),p)\ s=o t + J (t-s)^d^(s),p)ds\fi(dp) 

r — exp _ ,"L p 

d( 7 (0),p)/x(dp) + ( 7 (0), / Jf J°\ n{dp))t 



Af\7[0,6] ' JM\y[0,b] d(-y(0),p) 

+ / (t - s)ds / Hessdp( 7 (s),7(s))/i(dp) 

j0 JAf\7[0,6] 



= /(7(0))- / d( 7 (0),p)//(dp) 
./ 7(0,6] 

^ r — exp _ ,L » 

+ ^/(7(^))L_ 0+ t-(7(0), / 7(0) M (#))t- At { 7 (0)}|7|t 

+ / (t - s)ds / Hessd p (7(s),7(s))/i(c(p) 

j0 JM\7[0,fe] 

Since 



/(7(t))= / d( 7 (t),p)/i(dp)+ / d(ry(t),pMdp)) 

JM\y[0,b] Jf[0,b] 
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we obtain 



/(7(t))-/(7(0))-^-/(7(s))L =0+ *- [\t-s)ds [ Hessd p ( 7 ( S ), 7 ( S ))//(dp) 

ds IS " U+ Jo 7m\ 7 [0,6] 



d( 7 (0)^H*)+M7(0,6])|7|i-M7(0)}|7|i+ / d(j(t),p)p(dp) 

7(0,6] - / 7[0,t>] 



d(7(0),p)M(dp) + / d( 7 (0),7(t)Mdp) + / d(j(t),p)v(dp) 
7(0,6] -'7(0,6] -'7(0,6] 



d( 7 (0),p)//(dp) + / d( 7 (0), 7 (t)Mdp)+ / d(j{t),p)ti(d P ) 

7(0,t) ^7(0,*) -'7(0,i) 

+ (-/ d( 7 (o),pKdp)+ / d( 7 (o), 7 (t)Hdp)+ / d^(t),p)n(d P ) 

\ Jf[t,b] J"f[t,b] Jf[tM 

= 2 / d( 7 (*),p)/i(rfp) + = 2| 7 | / (t - o 7 )(d s ) 

-'7(0,4) J(0,t) 

hence the desired formula holds. To show that f is the second derivative of 
/ o 7 on (0, 6) in the sense of distribution, let y> G C£°( 0, 6) and by Fubini's 
theorem and integration by parts we get 

(0,6) 

:/( 7 (0)) / <p"(t)dt + A/( 7 ( s ))| / V(t)dt + / ^(t)dt f (t- s)u(ds) 
7(0,6) OS S ~ U+ 7(0,6) 7(0,6) 7(0,t) 

i/(ds) / (t- s)<p"(t)dt = / y>(s)i/(ds) 

(0,6) 7(3,6) 7(0,6) 

the proof is complete. □ 

Now we are ready to show the strong convexity of / under condition * 
which will be useful for our error estimates. Certainly, this also yields the 
uniqueness of the median. 

Theorem 3. If condition * holds, then for every geodesic 7: [0,6] — > 
B(a,p), we have the following inequality 

/( 7 (i)) > /( 7 (0)) + f s f(7(s))\ s=0+ t + r| 7 | 2 t 2 , t 6 [0,b] 
where the constant r = (1/2) e 2 r/ M D(p, A) L(p, S) 2 . 

Hence f is r -convex. That is, for every unit velocity geodesic 7 : [0,6] — > 
B(a,p), the function 1 1— ► /( 7 (i)) — ri 2 zs convex. 

Proof. Without loss of generality we may assume that 7 (0) and 7 (6) are 
both in dB(a,p). Then by the first estimation in Lemma [T] we obtain that 
for every s £ [0, 6], 

/ Hessd p ( 7 (s),7(s))/i(dp)> / D(p, A)\f™ (s)\ 2 p,(dp) 

JM\~f[0,b] JM\~r[0,b] 



D(p,A)\j\ 2 f sin 2 Z( 7 (s),exp Kp)p(dp) 

7m\ 7 [0,6] 
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Then for every p S M \ 7[0, 6], let q = q(p) be the orthogonal projection 
of p onto 7. If 7(5) 7^ q, the triangle Apqj(s) is a right triangle with 
Zpq'j(s) = it/2. Hence Lemma [6] yields that 

sinZ(7(s)),exp~ 1 s) p) > L(p,5)d(p,q) 
thus by Lemma [3] we obtain 

Hessdp(7(s),7(s))/z(dp) 

Af\7[0,6] 

>D(p,A)\j\ 2 L(p,S) 2 f d 2 (p,q)p(dp) 
JM\y[0,b] 

>D(p,A)\>y\ 2 L(p,6) 2 [ d 2 {p,q)p(dp) 

J M\_B(7,e M ) 

>el^D(p,A)L(p,S) 2 \^\ 2 = 2r\j\ 2 
Then by Proposition [3] we get that for t G [0,6], 

/(7W) = /(7(0)) + ^f(l(s))\ s=0+ t + [ (t- s)v(ds) 

>/(7(0)) + #/(7(5))L = nJ + 2r|7| 2 [ (t - s)ds 



>/(7(0)) + ^/(7( 5 ))| s=0+ i + r| 7 | 2 t 2 



(o,t) 



hence the desired inequality holds. To show the r-convexity of /, let 7: 
[0, b] — > B(a, p) be a unit velocity geodesic, then the above inequality shows 
that the function f(j(t)) — rt 2 has an affine support on every t € [0, b], so 
that it is convex. □ 

4. A SUBGRADIENT ALGORITHM 

To begin with, we recall the notion of subgradient for a convex function on 
a Riemannian manifold. For our purpose, it suffices to consider this notion 
in a convex subset of the manifold. 

Definition 2. Let U be a convex subset of M and h be a convex function 
defined on U . For every x G U, a vector v £ T X M is called a subgradient of 
h at x if for every geodesic 7: [0,6] — > U with 7(0) = x, we have for every 
t £ [0,6], 

h(r((t))>h(x)+ (7(0), v)t 

Our idea to approximate the Riemannian median of p by the subgradient 
method stems from the following simple observation. 

Lemma 7. For every x G B(a,p), H{x) is a subgradient of f at x. 

Proof. Let 7: [0,1] — ► B(a,p) be a geodesic such that 7(0) = x, then by 
Proposition Q] together with the convexity of / we get for every t G [0,1], 

/(7(*))>/(7(0)) + ^/(7^))| s=0+ i 

= /(*) + ((7(0), H(x))+p{x}\j\)t 
>f(x)+ (7(0),tf(x))t 
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thus the assertion holds. 



□ 



In order to introduce our subgradient algorithm we need the following 
notations. 

Notation 3. Let x £ B(a,p) with H(x) 7^ ; then we write 



We can now describe our subgradient algorithm to approximate the Rie- 
mannian median of the probability measure p. 

Algorithm 1. Subgradient algorithm for Riemannian median: 
Step 1: 

Choose a point xq G B(a, p) and let k = 0. 
Step 2: 

If \H(x k )\ < p{x k }, then stop and let m = Xk- 
If not, then go to step 3. 
Step 3: 

Choose a stepsize tk E (0, r Xk ] and let Xk+i = 7x fc (ifc), then come back to 
step 2 with k = k + 1. 

Observe that according to the definition of r x , the sequence (xfc)fc is con- 
tained in the ball B(a, p). Now we turn to the convergence proof of the above 
algorithm under some conditions of the stepsize. It is well known that the 
following type of inequalities are of fundamental importance to conclude the 
convergence of the subgradient algorithms in Euclidean spaces: 



see for example, Correa and Lemarechal [8], Nedic and Bertsekas [19]. For a 
positively curved Riemannian manifold, Ferreira and Oliveira [10] obtained 
a generalization of the above inequality by using Toponogov's comparison 
theorem. But their method is not applicable in our case since we do not 
suppose that 5 > 0. However, we can still obtain a similar result using a 
different method. 

Lemma 8. If H{xj t ) ^ ; then for every point y £ B(a,p) we have 



j x (t) = exp x (-t ) , t > 

r x = sup{t G [0,2p] : 7^) G B(a,p)} 




\\x k+ i - y\\ 2 <\\x k - y\\ 2 + Mt\ + A 2 



2t k 



(f(y)-f(x k )) 



d 2 (x k+ i,y) < d 2 (x k ,y) + C(p,5)t 2 k + 



2t k 



(/(!/) " f(x k )) 



H(x k )\ 



Particularly, 



d 2 ( Xk+1 ,m ll ) < d 2 (x k ,m ll ) + c(p,5)t 2 k + 2tk{f* - f{x k )) 
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Proof. By Taylor's formula and the second estimation in Lemma [H there 
exists £ G (0, r x ) such that 

1 1 

-<i 2 (x fe+ i,y) = -d 2 (7o; ft (tfc),y) 



\d\ lxk {t),y) 



t J k+ 2dt* 



2 

7}d 2 (x k ,y) + {j Xk (0), grad^dl(x k ))t k + - Hess - 



.1,2, \ , (H(x k ), exp x ly) C(p,5) +2 
<^d(x k ,y) + ^-^ t k + —j- t„ 

By Lemma El H(x k ) is a subgradient of / at point x k and hence 
( H(x k ), exp^ 1 y ) < f(y) - f(x k ) 

thus we get 

\d\x k+l ,y) < \d\x k ,y) + J^ifiv) ~ /(**)) + ^-tl 

then the first inequality holds. The second one follows from /* < f(x k ) and 
\H{x k )\ < 1. □ 

As in the Euclidean case, once the fundamental inequality is established, 
the convergence of the subgradient algorithm is soon achieved, see for ex- 
ample Correa and Lemarechal [8], Nedic and Bertsekas [19] . Since our fun- 
damental inequality is very similar to the one in Euclidean case, the proof 
of convergence is also very similar. 

Theorem 4. If the stepsize (t k ) k verifies 

oo 

lim t k = and > t k = +oo 

then we have 



fc^oo 

k=0 



lim d(x k ,dJl tl ) = and lim f(x k ) = /* 

fc— >oo ' fc— >oo 

Moreover, if the stepsize (t k ) k verifies also that 

oo 

< +°° 

fc=0 

then there exists some m G 9Jt^ suc/i that x k — > m, when k — > oo. 

Proof. If |-ff(a?j.)| < ^{xfc} for some A; > 0, then Theorem [T] yields that 
Xfc G 50?^ and the the desired result is trivially true. Hence in the following 
we assume that \H(x k )\ > fi{x k } for every k, this means that none of x k is 
in the median of fi. We will show firstly that 

liminf f(x k ) = /* 

fc— +oo 

If this is not true, then there exist N\ G N and r/ > such that for every 
k > Ni we have /* — /(x^) < —77. Since > 0, then Lemma [8] gives 
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that 

d 2 (x k+1 , 97T M ) < d 2 {x k , + t k (C(p, 6)t k - 2rj) 

Since lim^oo t k = 0, we can suppose that C(p,5)t k < i] for every k > N\ 
and then 

d 2 (x k+1 ,M fl ) < d 2 (x k ,M^) -r]t k 
by summing the above inequalities we get 

k 

n^u< d 2 (x Nl ,m ll ) -d 2 {x k+1 ,m fl ) < d 2 {x Nll m ll ) 

i=Ni 

which contradicts with X^A^=o^ fc = this proves the assertion. 

Now for fixed e > 0, there exists N2 G N such that C(p,5)t k < 2e for 
every k > N2. We consider the following two cases: 
If f{x k ) >/* + £, then by Lemma [8] we obtain that 

d 2 {x k+x ,m tl ) < d 2 (x k ,m fl ) + c( P ,5)t 2 k + 2t fc (/» - f(x k )) 

< d 2 (x k ,M^) + (C(p,5)t k - 2e)t k < d 2 {x k ,Wl^) 

If f{x k ) < /* + e then x k £ L £ = {x £ B(a,p) : f(x) < /* + e} and if we 
write l £ = sup{d( y, 971^) : y G L £ }, hence in this case we have 

d{x k+ i, 97T M ) < d(x fc+ i, + 97t^) <t k + l £ 

In conclusion, we always have that for k > iV2, 

^(xfc+i,^) < max{d(xA,.,9Jt^),tA: + 

By induction we get for every n > k, 

d(x n+1 ,M IJ .) < max{d(x k ,dXt^),max{t k ,t k+ i, ...,t n } + l E } 
< max{d(x k ,dJl^),sup{ti : i > k} + l E } 

thus we get 

limsup(i(x n , 97?^) < max{d(x k , 971^), sup{tj : i > A;} + Z e } 

n^oo 

liminffc^oo /(xa,.) = /* yields that limmi k ^ QO d(x k ,Tl^) = 0, by taking the 
inferior limit on the right hand side we obtain 

limsup d(x n , 97^) < l e 

n^oo 

Now we show that l £ — > when e — > 0. By monotonicity of Z e , it suffices to 
show this along some sequence. In fact, observe that L £ is compact, thus for 
every e > 0, there exists y £ G L £ such that l £ = d(y £ ,Wl^). Since B(a,p) is 
compact, there exist a sequence — ► and y G B(a,p) such that y £fe — ► y. 
Since f{y £k ) < /* + we have /(y) < /* and hence y G 97t M . Consequently 
/ £fe — > 0. Thus we get d(x k ,Tl^) — > and this yields /(xfc) — > /*. 
If X^o^it < the compactness of B(a,p) and f(x k ) — ► f* imply that 
the sequence (x k ) k has some cluster point m G 97T M , hence Lemma [8] yields 

<i 2 (x fc+ i, m) < (i 2 (x fc , m) + C(p, <5)tf. 



i=k 



oo 

j.2 



RIEMANNIAN MEDIAN AND ITS ESTIMATION 15 

for every n > k, by summing the above inequalities we get 

n 

d 2 (x n+ i,m) < d 2 (x k ,m) + C(p, 5) ^ t 2 
and by letting n — > oo we obtain 

lim sup d 2 (x n ,m) < d 2 (x k , m) + C{p,8) t 2 

n— >oo . , 

i=k 

the proof will be completed by observing that the right hand side of the 
above inequality posseses a subsequence that converges to 0. 

□ 

Now we have to consider the choice of stepsize. In fact, we can choose 
(ifc)fc that verifies the conditions of the preceding theorem thus yields the 
desired convergence of our algorithm and this is justified by the following 
lemma. 

Lemma 9. infjr^ : x G B(a,p)} > 

Proof. Since the support of p is contained in B(a,p), if x G dB(a,p), H[x) 
is transverse to dB(a,p) and hence r x > for every x G B(a,p). Moreover, 
there exists e > such that supp(/x) C B(a, p — e), then for x G B(a, p — e), 
we have r x > p — d(x, a) > e. On the other hand, H is continuous on 
B(a, p)\B(a, p — e) which is compact, thus there exists a point xq G B(a, p)\ 
B(a, p — e) such that inf{r x : x G B(a, p) \ B(a, p — e)} = r XQ > 0. Hence we 
get inf{r x : x G B(a,p)} > min{e, r XQ } > 0. □ 

According to this lemma, we may take for example the stepsize tf. = 
r Xk /(k + 1) for every k > 0, then by Theorem U] the convergence holds. But 
the drawback is that we do not know r Xk . However, with further analysis 
we can give an explicit uniform lower bound of them. In the following, let 

a = sup{d(p, a) : p G supp^x} 

and observe that supp(/x) C B(a,p) yields a < p. 

Lemma 10. For every x G B{a,p) \ B(a,a) we have 

2d(x, a)S&(d(x, a) — a) 

TV > 



C(p, 5)S 1 \(d(x, a) + a) 

Proof. Since the diameter of B(a,p) is 2p, then 7 x (r x ) G dB(a,p). By 
Taylor's formula and the second estimation in Lemma [lj there exists £ G 
(0, r x ) such that 



\p 2 = \d 2( C1x{r x ),a) 



Id 2 
2dt 

1 .„ 1 .„ , , , 1 „ 1 



1 72/ \ d 
2 d ^ a) + Jt 



\d 2 ^ x (t),a) 



t J x+ 2dt* 



\d 2 { lx {t),a) 

J t= ; 

■2 1 



r 2 



d (x,a) + (73,(0), grad -djx) ) r x + -Hess ^d a (j^), i(g))r x 

1 2 (H(x), exp^q) C(p^) 2 

S 2 d(X ' aj+ |i?(x)| x+ 2 ^ 
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Gauss Lemma yields that ( exp^, 1 p, exp x 1 a ) > for p G supp p, hence 

( J7W, exp-> ») - - £^ <eXPJ ^7''°V w < 
Combine this with d(x, a) < p, C(p, 5) > and r x > 0, we obtain that 



^ 1 / (H{x), exp/g) (H(x), exp^g) 2 

r * - com) i ^Mi — + V — WW — + {pMp ~ { , )) 

> 1 f _ (H(x), exp^a) |(#0), exp^a)! 



C(p,<5)l |ff(x)| 
-2 (H(x), exp" 1 a) -2 _ x 

cm — fer^ - cm {h{x) > exPx a) 



— f 

J supp /I 

-d(Xja) I cos Zpxap(dp) 

'j "/ J supp /I 



■//(dp) 



For every p G supp(/u) we consider the triangle Apxa and its comparison 
triangle Apxa in M 2 ^. Since if < A in B(a,p), then Alexandrov's Theorem 
implies that Zpxa < Zpxa and hence cos Zpxa > cos Zpxa. Now it suffices 
to estimate the lower bound of cos Zpxd according to the sign of A. In fact, 
we have for every A G R, 

___ . S A (d(x,a) - a) 

cos Zpxa > — — — — 

b A {d{x, a) + a) 

We show this for the case when A > 0, the proof for cases when A < 
is similar. For this, let us observe that d(a,p) < a and that d(x,a) — a < 
d(x,p) < d(x,a) + a, then we obtain 

cos(y/~Kd(a,p)) — cos(v / Ai(x, a)) cos(y r Kd(x,p)) 



cos Apxa 



sin(\/Ad(x, a)) sin(\/Ad(x,p)) 
cos(\/A(t) — cos(\/Ad(x, a)) cos(\/A(d(x, a) — a)) 



> 

sin(V Ad(x, a)) sin(V A(d(x, a) + a)) 

sin(\/ r K(d(x , a) — a)) S A (d(x , a) — a) 

sin(\/A(d(x, a) + a)) S A (d(x, a) + a) 

hence the claimed inequality is proved and consequently, 

2d(x,a) f S A (d(x,a) — a) 2d(x, a)S A (d(x, a) — a 

r x > ~n — FT / a (M r~l tM«P) " 



C(p,S) 7 SU p PAt S A {d(x,a) +cr) C(p, 5)S A (d(x, a) + a) 

□ 

Now we can give the desired lower bound. 

Lemma 11. For every x £ B(a,p) we have 

p-a 



r x > 



C(p, 5) cosh(2p^Z\|) + 1 
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Proof. Firstly, assume that x G B(a,p) \ B(a,o~) then by the preceding 
lemma, if A > we have 

^ 2d(x, a) sm(\fA(d(x, a) — a)) 
Tx ~ C(p,5) sin(v / A(d(x,a)+ ( r)) 

observe that < y/A(d(x,a) + a)) < 2p\fK < tt/2 and that for < u < 
v < vr/2, we have (sinii/sinw) > (u/v), then we obtain 

2d(x, a) d(x, a) — <r 2<i(x, a) a) — a a) — a 

Tx ~ C{p,6) d(x,a)+a ~ C(p,5) 2d{x,a) ~ C(p, S) 

On the other hand, we always have r x > p — d(x, a) and hence 

d(x, a) — a 
r x > max{/) - d(x, a), —— — ^— } 

Observe that 

f d(x,a)—a 1 p — a 

mm < maxjp — d(x, a), — — — — — } : a < d[x, a) < p 



C(p,8) v ' '-f] C(p,8) + 1 

hence we get 

p- a 

r x > 



C(p,S) + l 
If A = 0, then we have 

2d(x, a) d(x, a) — a 



r x > 



C(p, 5) d(x, a) + a 

the same proof as above yields the same result. 
If A < then we have 



2d(x, a) sinh(V— A(d(x, a) — a)) 
Tx ~ C(p,6) smh(^/^A(d(x,a) 

Observe that for u > 0, sinhu et coshu are nondecreasing and that u < 
sinh u < u cosh u, then we obtain 

2d(x,a) \/—A(d(x, a) — a) d(x,a)—a 
r x > -TT, FT ■ , , r-= F^T, TV > 



C(p,5) sinh( v /z A2d(x,a)) ~ C(p, 5) cosh(2^ v /Z A) 
the same method as in the case A > yields 

Tx ~ C(p, 5) cosh(2p^A) + 1 

In conclusion, since cosh(2py / | A|) > 1, we have always 

> P- o 

Tx ~ C(p, 5) cosh(2p v ^Aj) + 1 

for every x £ B(a,p) \ B(a,a). Moreover, for those x G B(a,a), 

p — a 

r x > p — a > ; 

C(p,<5) cosh (2,9 v^Aj) + 1 

hence the desired result holds. 



□ 



Thanks to the above estimation, we get a practically useful version of 
Theorem HJ 
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Theorem 5. Let (a k ) k be a sequence in (0, 1] such that 

oo 

lim a k = and > = +oo 

fc^oo ^ — ' 
fc=0 

and Zei tfc = /3afc in the algorithm with 



0</5< P ~ a 



C(p, 5) cosh(2 / 9^A|) + 1 
then we have 



lim d(xk,%fl^) = and lim f{x k ) = /* 

fc— >oo 1 Ai— >oo 

Moreover, if (a k ) k verifies also that 

oo 
fc=0 

i/ien i/iere existes some m £ dJt^ such that x k — * m, when k — > oo. 

Proof. This is a simple corollary to Theorem H] and Lemma [TTJ □ 

Now we turn to the problem of error estimates of the subgradient algo- 
rithm under condition *. 

Proposition 4. Let condition * hold and the stepsize (t k ) k satisfy 

oo 

lim t k = and > t k = +oo 

fc=o 

T/ien iZiere exists Af EN, swc/i i/iai /or every k > N, 

d 2 (x k ,m) < b k 

where m is the unique median of p, and the sequence (b k ) k >N is defined by 

b N = {p + af and b k+1 = (1 - 2rt k )b k + C(p, 6)t 2 k , k>N 
which converges to when k — > oo. More explicitely, for every k > N , 

k k k 

b k+1 = (p + a) 2 H(l-2rt l ) + C(p,5){ £ t 2 „ 1 \{{l-2Tt l )+t 2 ) 

i=N j=N+l i=j 

Proof. Since t k — > 0, there exists N £ N, such that for every k > N, we 
have < 1. For every of these k, we choose a geodesic 7: [0, 1] — > 5(a, /j) 
with 7(0) = m and 7(1) = Then by Theorem [31 

/(a*) - /* > ^/(7( s ))| s=0+ + rd 2 (x k ,m) > Td 2 (x k ,m) 

combining this and Lemma [8] we get 

d 2 (x k+l , m) < (1 - 2Tt k )d 2 (x k ,m) + C(p, 5)t 2 k 

Proposition [5] yields that d 2 (xj\r,m) < (p + er) 2 = 6jv and then by induction 
it is easily seen that d 2 (x k ,m) < b k for every k > N. The same method as 
in the proof of Theorem H] can show that b k — > 0. In fact, we first show that 

lim inf b k = 

k— >oo 
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If this is not true, then there exist N\ > N and r] > such that for every k > 
N\ we have b k > rj. Since limfc_ ) . 00 t k = 0, we can suppose that C(p, S)t k < tt] 
for every k > N± and then 

h+i = b k + tk(C(p, 5)t k - 2rb k ) < b k + t k (C(p, S)t k - 2tt]) < b k - rr]t k 

by summing the above inequalities we get 

k 

rrj^2ti< b Nl - b k+ i < b Nl 
i=Ni 

which contradicts with ^j^-gifc = this proves the assertion. 
Then for every k > N, we consider the following two cases: 
If b k > C(p,5)t k /(2r), then we get 

h+i <b k - 2Tt k (C(p,5)t k /(2r)) + C(p,5)t 2 k = b k 

If b k < C(p, <5)tfc/(2r), then we get 

b k+1 = (1 - 2rt k )C(p, <%/(2r) + C(p, S)t 2 k = C(p, 8)t k /(2r) 

Thus we have always that 

bk+i < max{b k ,C(p,5)t k /(2T)} 

by induction we get for every n > k, 

b n +i < max{6 fe , (C(p, 8)/ (2r)) max{i fc , t k+i , t n }} 

thus we have 

limsup6 n < max{6fc, (C(p, 5)/(2t)) sup{tj : i > k}} 

n^oo 

In order to get b k — ► 0, it suffices to take the inferior limite on the right hand 
side. Finally, the explicit expression of (b k ) k follows from induction. □ 

We proceed to show that if the stepsize (t k ) k is chosen to be the harmonic 
series, then the rate of convergence of our algorithm is sublinear. To do this, 
we use the following Lemma in Nedic and Bertsekas |18j . 

Lemma 12. Let (u k ) k be a sequence of nonnegative real numbers satisfying 



where a and £ are positive constants. Then 

i ( Un | 2 "C(2- 



U k+ 1 < < 



M-o + q^) if0<a<l; 



C(i+ln(fc+i)) if a -l- 

fc+l IJ ut ±, 

(C+ ^t-f ) ifa>l; 



K (a-l)(fc+2) ^ ( k +2Y 
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Proposition 5. Let condition * hold and we choose t k = r/(k + l) for every 
k > with some constant r > 0, then we have 



d 2 (x k+ i,m) < < 



r 2 C(p,8) 



2 a r 2 C(p,S)(2-a) - 
1—a 



j^P(l + ln(fc+l)) 



(r*C(p,6) + ^0^^ : 



y (a-l)(fc+2) 

where m is the unique median of p and a = 2rr 

Proof. As in the proof of Proposition 01 we get that for every k > 

d 2 (x fc+ i,m) < (1 



i/0 < a < 1; 
if a = 1; 
if a > 1; 



2rr , 2 , r 2 C(p,5) 
-)d {x k ,m) + 



fc + l'~ v ~' w "" / ' (fc + l) 2 

then it suffices to use the preceding lemma with a = 2rr and £ = r 2 C(p, 5). 
Observe also that d(xo,m) < p + a by Proposition □ 
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